Scalable parallel algorithms for surface fitting and data mining

نویسندگان

  • Peter Christen
  • Markus Hegland
  • Ole Møller Nielsen
  • Stephen G. Roberts
  • Peter E. Strazdins
  • Irfan Altas
چکیده

This paper presents scalable parallel algorithms for high dimensional surface fitting and predictive modelling which are used in data mining applications. These algorithms are based on techniques like finite elements, thin plate splines, wavelets and additive models. They all consist of two steps: First, data is read from secondary storage and a linear system is assembled. Secondly, the linear system is solved. The assembly can be done with almost no communication and the size of the linear system is independent of the data size. Thus the presented algorithms are both scalable with the data size and the number of processors.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Scalable Data Mining for Rules

Data Mining is the process of automatic extraction of novel, useful, and understandable patterns in very large databases. High-performance scalable and parallel computing is crucial for ensuring system scalability and interactivity as datasets grow inexorably in size and complexity. This thesis deals with both the algorithmic and systems aspects of scalable and parallel data mining algorithms a...

متن کامل

Efficient Data Mining: Scripting and Scalable Parallel Algorithms

This paper presents our approach to data mining that allows the coupling of parallel applications with a scripting language resulting in an efficient and flexible toolbox. Parallel algorithms which are scalable both in data size and number of processors are a key issue to be able to solve the ever increasing problems in data mining. On the other hand, data mining applications should be flexible...

متن کامل

Parallel Algorithms for Predictive Modelling

Parallel computing enables the analysis of very large data sets using large collections of flexible models with many variables. The computational methods are based on ideas from computational linear algebra and can draw on the extensive research on parallel algorithms in this area. Many algorithms for the direct and iterative solution of penalised least squares problems and for updating can be ...

متن کامل

SPRINT: A Scalable Parallel Classifier for Data Mining

Classification is an important data mining problem. Although classification is a wellstudied problem, most of the current classification algorithms require that all or a portion of the the entire dataset remain permanently in memory. This limits their suitability for mining over large databases. We present a new decision-tree-based classification algorithm, called SPRINT that removes all of the...

متن کامل

SPRINT: A Scalable Parallel Classi er for Data Mining

Classi cation is an important data mining problem. Although classi cation is a wellstudied problem, most of the current classication algorithms require that all or a portion of the the entire dataset remain permanently in memory. This limits their suitability for mining over large databases. We present a new decision-tree-based classi cation algorithm, called SPRINT that removes all of the memo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Parallel Computing

دوره 27  شماره 

صفحات  -

تاریخ انتشار 2001